AITopics | failure prediction

Failure Prediction at Runtime for Generative Robot Policies

Neural Information Processing SystemsJun-14-2026, 14:47:41 GMT

Imitation learning (IL) with generative models, such as diffusion and flow matching, has enabled robots to perform complex, long-horizon tasks. However, distribution shifts from unseen environments or compounding action errors can still cause unpredictable and unsafe behavior, leading to task failure. Therefore, early failure prediction during runtime is essential for deploying robots in human-centered and safety-critical environments. We propose FIPER, a general framework for Failure Prediction at Runtime for generative IL policies that does not require failure data. FIPER identifies two key indicators of impending failure: (i) out-ofdistribution (OOD) observations detected via random network distillation in the policy's embedding space, and (ii) high uncertainty in generated actions measured by a novel action-chunk entropy score.

artificial intelligence, deep learning, machine learning, (17 more...)

Neural Information Processing Systems

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.50)

Add feedback

Failure Prediction at Runtime for Generative Robot Policies

Neural Information Processing SystemsJun-10-2026, 03:49:56 GMT

Imitation learning (IL) with generative models, such as diffusion and flow matching, has enabled robots to perform complex, long-horizon tasks. However, distribution shifts from unseen environments or compounding action errors can still cause unpredictable and unsafe behavior, leading to task failure. Therefore, early failure prediction during runtime is essential for deploying robots in human-centered and safety-critical environments. We propose FIPER, a general framework for Failure Prediction at Runtime for generative IL policies that does not require failure data. FIPER identifies two key indicators of impending failure: (i) out-of-distribution (OOD) observations detected via random network distillation in the policy's embedding space, and (ii) high uncertainty in generated actions measured by a novel action-chunk entropy score.

artificial intelligence, machine learning, proceedings, (7 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Robots (0.88)
Information Technology > Artificial Intelligence > Machine Learning (0.59)

Add feedback

Addressing Failure Prediction by Learning Model Confidence

Charles Corbière, Nicolas THOME, Avner Bar-Hen, Matthieu Cord, Patrick Pérez

Neural Information Processing SystemsFeb-12-2026, 15:18:28 GMT

Neural Information Processing Systems http://nips.cc/

failure prediction, neural network, prediction, (17 more...)

Neural Information Processing Systems

Country:

Europe > France > Île-de-France > Paris > Paris (0.04)
North America > Canada (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.96)

Add feedback

757f843a169cc678064d9530d12a1881-AuthorFeedback.pdf

Neural Information Processing SystemsFeb-12-2026, 15:18:20 GMT

confidence estimation, confidnet, prediction, (15 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Addressing Failure Prediction by Learning Model Confidence

Neural Information Processing SystemsDec-25-2025, 14:26:51 GMT

Assessing reliably the confidence of a deep neural net and predicting its failures is of primary importance for the practical deployment of these models. In this paper, we propose a new target criterion for model confidence, corresponding to the True Class Probability (TCP). We show how using the TCP is more suited than relying on the classic Maximum Class Probability (MCP). We provide in addition theoretical guarantees for TCP in the context of failure prediction. Since the true class is by essence unknown at test time, we propose to learn TCP criterion on the training set, introducing a specific learning scheme adapted to this context. Extensive experiments are conducted for validating the relevance of the proposed approach. We study various network architectures, small and large scale datasets for image classification and semantic segmentation. We show that our approach consistently outperforms several strong methods, from MCP to Bayesian uncertainty, as well as recent approaches specifically designed for failure prediction.

failure prediction, learning model confidence, name change, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.61)

Add feedback

FPC-VLA: A Vision-Language-Action Framework with a Supervisor for Failure Prediction and Correction

Yang, Yifan, Duan, Zhixiang, Xie, Tianshi, Cao, Fuyu, Shen, Pinxi, Song, Peili, Jin, Piaopiao, Sun, Guokang, Xu, Shaoqing, You, Yangwei, Liu, Jingtai

arXiv.org Artificial IntelligenceDec-4-2025

Robotic manipulation is a fundamental component of automation. However, traditional perception-planning pipelines often fall short in open-ended tasks due to limited flexibility, while the architecture of a single end-to-end Vision-Language-Action (VLA) offers promising capabilities but lacks crucial mechanisms for anticipating and recovering from failure. To address these challenges, we propose FPC-VLA, a dual-model framework that integrates VLA with a supervisor for failure prediction and correction. The supervisor evaluates action viability through vision-language queries and generates corrective strategies when risks arise, trained efficiently without manual labeling. A dual-stream fusion module further refines actions by leveraging past predictions. Evaluation results on multiple simulation platforms (SIMPLER and LIBERO) and robot embodiments (WidowX, Google Robot, Franka) show that FPC-VLA outperforms state-of-the-art models in both zero-shot and fine-tuned settings. Successful real-world deployments on diverse, long-horizon tasks confirm FPC-VLA's strong generalization and practical utility for building more reliable autonomous systems.

artificial intelligence, large language model, natural language, (12 more...)

arXiv.org Artificial Intelligence

2509.04018

Country: Asia > China (0.46)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Enhancing failure prediction in nuclear industry: Hybridization of knowledge- and data-driven techniques

Saley, Amaratou Mahamadou, Moyaux, Thierry, Sekhari, Aïcha, Cheutet, Vincent, Danielou, Jean-Baptiste

arXiv.org Artificial IntelligenceNov-18-2025

The convergence of the Internet of Things (IoT) and Industry 4.0 has significantly enhanced data-driven methodologies within the nuclear industry, notably enhancing safety and economic efficiency. This advancement challenges the precise prediction of future maintenance needs for assets, which is crucial for reducing downtime and operational costs. However, the effectiveness of data-driven methodologies in the nuclear sector requires extensive domain knowledge due to the complexity of the systems involved. Thus, this paper proposes a novel predictive maintenance methodology that combines data-driven techniques with domain knowledge from a nuclear equipment. The methodological originality of this paper is located on two levels: highlighting the limitations of purely data-driven approaches and demonstrating the importance of knowledge in enhancing the performance of the predictive models. The applicative novelty of this work lies in its use within a domain such as a nuclear industry, which is highly restricted and ultrasensitive due to security, economic and environmental concerns. A detailed real-world case study which compares the current state of equipment monitoring with two scenarios, demonstrate that the methodology significantly outperforms purely data-driven methods in failure prediction. While purely data-driven methods achieve only a modest performance with a prediction horizon limited to 3 h and a F1 score of 56.36%, the hybrid approach increases the prediction horizon to 24 h and achieves a higher F1 score of 93.12%.

data mining, knowledge management, machine learning, (22 more...)

arXiv.org Artificial Intelligence

doi: 10.1016/j.cie.2025.111387

2511.11604

Country:

Asia (0.92)
Europe > France (0.46)
Africa > Middle East > Tunisia (0.28)

Genre:

Overview (1.00)
Workflow (0.93)
Research Report > New Finding (0.67)

Industry: Energy > Power Industry > Utilities > Nuclear (1.00)

Technology:

Information Technology > Knowledge Management > Knowledge Engineering (1.00)
Information Technology > Data Science > Data Quality (1.00)
Information Technology > Data Science > Data Mining (1.00)
(7 more...)

Add feedback

Failure Prediction at Runtime for Generative Robot Policies

Römer, Ralf, Kobras, Adrian, Worbis, Luca, Schoellig, Angela P.

arXiv.org Artificial IntelligenceOct-14-2025

Imitation learning (IL) with generative models, such as diffusion and flow matching, has enabled robots to perform complex, long-horizon tasks. However, distribution shifts from unseen environments or compounding action errors can still cause unpredictable and unsafe behavior, leading to task failure. Early failure prediction during runtime is therefore essential for deploying robots in human-centered and safety-critical environments. We propose FIPER, a general framework for Failure Prediction at Runtime for generative IL policies that does not require failure data. FIPER identifies two key indicators of impending failure: (i) out-of-distribution (OOD) observations detected via random network distillation in the policy's embedding space, and (ii) high uncertainty in generated actions measured by a novel action-chunk entropy score. Both failure prediction scores are calibrated using a small set of successful rollouts via conformal prediction. A failure alarm is triggered when both indicators, aggregated over short time windows, exceed their thresholds. We evaluate FIPER across five simulation and real-world environments involving diverse failure modes. Our results demonstrate that FIPER better distinguishes actual failures from benign OOD situations and predicts failures more accurately and earlier than existing methods. We thus consider this work an important step towards more interpretable and safer generative robot policies. Code, data and videos are available at https://tum-lsy.github.io/fiper_website.

artificial intelligence, deep learning, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2510.09459

Genre: Research Report > New Finding (1.00)

Technology: